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ABSTRACT 

This literature review documents the state-of-the-art 
of readability assessment and indicates directions for future 
research in readability measurement. The period since 1953 is 
emphasized, although there is some consideration of earlier work. 
Primary emphasis within the review is based on readability 
measurement; however, a final section is included which reviews 
recent work bearing on the topic of increasing comprehensibility 
through multimodal presentation methods. Various formulas for 
calculating readability are presented and placed in historical 
perspective. The contents include: "Introduction," which presents the 
scope and organization of the review; "Methods for Measuring 
Reliability," which discusses rating methods, use tests, readability 
formulas, early formulas, detailed formulas, recent formulas, cloze 
procedures, and multimodal presentations; and "Discussion," which 
discusses the estimation of reading levels, formulas, and future 
research. An appendix of various readability measures is also 
included. (WR) 
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SUMMARY 



Probl em 

It is expected that the mental aptitude and academic achieve- 
ment levels of enlistees into the United States Air Force may drop 
as the goal of an all-volunteer military force is approached. More- 
over, there is currently a gap between the reading achievement level 
required to read certain Air Force technical literature and the read- 
ing ability levels of enlistees. Accordingly, the Human Resources 
Laboratory initiated a program to define methods to optimize the 
matching of technical training materials to the literacy skills of Air 
Force trainees. This review of the literature relating to methods 
for determining the readability of textual material represents the 
first result of this program. The two remaining aspects of the pres- 
ent study are: (1) experimental evaluation of modified training materi- 
als, when presented with and without auditory, supplementation, and 
(2) preparation of a training materials modification handbook, which 
will integrate information from the literature review, the material 
modification effort, and the experiment. 

This literature review is intended to: (1) document the state- 
of-the-art of readability assessment, and (2) indicate directions for 
future research in readability measurement. 

App roach 

This report selectively reviews the literature relative to read- 
aoility-comprehensibility measurement. The period since 1953 is em- 
phasized, although there is some consideration of earlier work. 

Sources searched for relevant literature included the Psycho- 
logical Abstracts , from 1950 through 1971, the Technical Abstract 
Bulletin of the Defense Documentation Center, from 1962 through 
1971, and the U. S. Government Research and Development Reports 
of the Department of Commerce, from 1968 through 1971. The 
PASAR automated retrieval system of the American Psychological 
Association was employed to search more completely the literature 
abstracted in the Psychological Abstracts between 1967 and 1970. Of 
course, many additional references were found while reading the pa- 
pers indicated through the above sources. 
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\ Many readability formulas have been derived. The majority 
are\based on linear regression equations relating various observed 
characteristics of text; i. e. , sentence length or word length, to some 
criterion of comprehension, such as a comprehension test score. All 
of the formulas are highly intercorrelated and are all undoubtedly high- 
ly correlated with cloze score. Cloze score is based on a relatively 
new procedure in which a judgment of readability is based on the per- 
centage of deleted words in textual material which subjects are able 
to replace correctly. Cloze score has gained considerable recent ac- 
ceptance as a readability criterion. But practical considerations may 
make application of one of the many other available formulas more ap- 
propriate in many instances. 

Conclusions 

The readability measurement field suffers from the lack of a 
unifying conceptual or theoretical structure. Clearly, readability is 
multifactor in nature. The dimensions of readability must be deter- 
mined with consideration given to variables within both the text and 
the reader. Upon isolation of these variables, studies to indicate how 
the variables interact in dynamic reading situations will be required. 
While the call for a unifying theoretical structure may seem to evade 
practical issues, it appears that little real progress can be made in 
readability measurement without such a structure. Specific sugges- 
tions for required research are contained within the body of this re- 
port. 
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CHAPTEH I 



INTRODUCTION 




It is expected that the mental aptitude and academic achieve- 
ment levels of enlistees into the U. S. Air Force will drop as the 
goal of an all -volunteer military force is approached (Valentine & 
Vitola, 1970). Project 100, 000 has also had the effect of lowering 
the overall academic achievement level of the Air Force. Accord- 
ingly, it is expected that efforts to increase the comprehensibility of 
written materials, both those intended for training purposes and those 
used on the job, will yield significant advantages. An example of the 
mismatch between the reading level of military personnel and the read- 
ability of the materials they use can be found in the work of Vineberg, 
Sticht, Taylor, and Caylor (1970), who reported that 75 per cent of a* 
sample of Army Military Occupational Specialties' reading materials 
were written at level six to eight school grades higher than the read- 
ing level of low- -vel (Category IV) personnel and four to six grade lev- 
els higher thar the average reading levels of non- Category IV person- 
nel. The need tc match ihe reading level of Air Force technical litera- 
ture to the reading level of trainees and job incumbents is quite clear. 

Scope and Organization of this Review 

This report reviews the literature relevant to techniques for 
measuring the readability/ comprehensibility of written materials. 
Use of such techniques could help to improve the intelligibility of Air 
Force reading materials and by implication reduce the gap between 
the reading level of the reader and the material he reads. Primary 
emphasis within the review is based on readability measurement. How- 
ever, a final section is included which reviews recent work bearing on 
the topic of increasing comprehensibility through multimodal presenta- 
tion methods. 

The term "readability" may be detined as: 



the stum total (including the interactions) of all those elements within a 
gjven piece of printed material that affects the success a group of readers 
have with it. The success is tne extent to which they understand it, read 
it at an optimum speed, and find it interesting (Dale & Chall, 1949). 
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Since readability measurement within a training context is the topic 
of interest here, only the first part of this comprehensive definition-- 
readability as it affects ease of understanding or comprehension- -will 
be considered directly. 

Within the measurement concept, ''readability formulas, in 
which attempts are made to predict th^ readability of written material 
based on quantitative analysis of the material, will be considered first. 
These methods have been traditionally based on items such as average 
word length (in letters), average sentence length (in words), frequency 
of occurrence of words not appearing on lists of common words, and 
frequency of occurrence of prepositional phrases and independent 
clauses. Klare (1963) reports that various reviewers have counted 
between 29 and 56 such predictive formulas. However, work on this 
type of formulation has greatly slowed within the past 15 years (Bormuth^ 
1966; Tannenbaum, & Greenberg, 1968). Studies of readability since 
that time have tended to concentrate on the ''cloze" procedure of W. L. 
Taylor (1953). 

The cloze procedure is the second topic of discussion. In this 
technique of readability assessment, the percentage of deleted words 
in a passage porrectly filled in by a reader is taken as an index of 
readability. This procedure overcomes the problem of measuring the 
readability of technical literature having an unusual, specialized vocabu- 
lary which is familiar to individuals within a specific milieu- -the pre- 
cise problem of interest in dealing with technical job-related literature. 
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CHAPTER II 




METHODS FOR MEASURING READABILITY 



Many approaches are possible to the problem of objectively 
measuring the regtdability or comprehensibility bf written prose. The 
most elementary methods; e. g. , rating methods and use tests, pos- 
sess serious drawbacks. 



Rating Methods 

In rating m* thods, judgments of the readability of written manu- 
als are made by samples which are representative of the intended user 
population or by persons considered to be expert in the covered field. 
These judgments are necessarily quite subjective, requiring that a 
large number of raters be employed to "average out" the variability 
between raters. Raters must also be presented with materials cover- 
ing a wide range of comprehensibility, so that they may choose the 
particular materials exhibiting the optimum level of comprehensibility 
for the intended user population. This procedure may be very expen- 
sive to use due to the range of written materials which must be prepared, 
the necessary size of the rating group, and the effort required to inter- 
pret the ratings. Problems may also be encountered in selecting an ap- 
propriate rating group and in generalizing from the rating group to the 
using population. 



Use Tests 

The use test method for evaluating readability involves admin- 
istering comprehension (use) tests to a sample of those for whom the 
material is intended, after the material has been read by the persons 
in the sample. High test scores are assumed to be associated with 
highly readable text, and low scores with less comprehensil le text. 
In addition to the time required to collect and test the sample of users, 
large amounts of time are required to write tests based on the text. 



standardization of the tests is impossible, a priori, and spe- 
cific tests may well be much too easy or difficult to rate accurately 
the text versions of interest. Moreover, if low scores are attained, 
it is not ki\own whether the low test scores can be attributed to the 
characteri.itics of the text itself or, possibly, to the inability of the 
tested sample to comprehend in general. Finally, there is Mttle 
guidance available regarding whether the tests should test transfer 
of factual information, transfer of main points, ability to apply in- 
formation, or what? 

The Readability Formulas 

Quantitative analysis of written text has become the most popu- 
lar method of assessing readability. The reliability and economy of 
such methods may be expected to far surpass that of the previously 
mentioned methods. One reviewer (Klare, 1963) reports that be- 
tween the publications of the first such formula in 1923 and 1959, 
over 29 "readability formulas" were developed. 

Quantitative analysis of written text has been conducted in hope 
of finding what determines a readable style. Until very recently, no 
theoretical model of language behavior was available from which to 
generate hypotheses. Hence, the method employed in analysis of text 
has been one of correlation. Characteristically, the text is analyzed 
and possible factors affecting readability (such as sentence length and 
vocabulary measures) are conjectured. 

A set of readings is then collected and ordered by readabil- 
ity according to a specific criterion (such as reading speed, tested 
comprehension, judgment). The chosen variables are then meas- 
ured, their correlations with the readability criteria determined, 
and regression equations written. Until very recently, linear regres- 
sion techniques and an additive model have been used exclusively. In 
generation of the equations, analysts have added additional factors un- 
til the increase in predictable variance accounted for by adding ai Dther 
factor was negligible. 

In recent years, activity in this area has sharply declined 
(Tannenberg &• Greenbaum, 1968). We may, however, soon witness 
a renewal of interest in measurement of readability as the science of 
psycholinguistics grows. Analysis of readability from a theoretical 
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point of view, e, g, , Bormuth (1966), may contribute greatly to the 
scientific understanding of the written information transfer process. 
Use of electronic data processing equipment in analyzing readability 
is also a great aid in overcoming two of the traditional problems in 
readability analysis: the time and effort required to perform the anal- 
ysis by hand, and reliability of measurement --both across time and 
across individuals. 



Rarly Formulas 

The fii'st important attempt to measure objectively readability 
represented an attempt to aid teachers in choosing appropriate texts 
for their classes. 

In the years around 1920, science teachers at the junior high 
school level complained that the number of technical and scientific 
terms present in textbooks intended for classroom use was becoming 
excessively large. It was becoming necessary to devote large amounts 
of class time to teaching the meanings of the new vocabulary, lessening 
the time available to teach scientific concepts. Lively and Pressey 
(1923) attempted to develop an objective method for determining the vo- 
cabulary difficulty, or "burden, " of textbooks, so that those books with 
exceptionally difficult vocabularies could be avoided. 

Their measure was based on a simplification of Thorndikc's 
T eacher's Word Book of 1 0. OOP Words, published in 1921. This book 
lists the frequency of occurrence of the most commonly occm-ring 
10,000 woi'ds in the English language. Lively and Pi-.?ssey used a 
"weighted median index number" as their measure of vocabulary dif- 
ficulty. In ord«.>r to compute the index, a sample of one thousand 
words evenly distributed through a text was taken. The individual 
words were found on Thorndike's list, and an index waa assigned to 
each on the basis of its location in the list. For example, those words 
appearing in the most conunon thousand . (according to Thorndike) re- 
ceived an index number of ten. Those in the second thousand were 
assigned index number nine, and so on through the ten, thousand-word 
blocks in the list, Wor-drf not appeai'ing on Thorndike's list were as- 
signed an indf.'X number of /.ero. 



To compute the vocabulary burden, the median value of the 
indices of the words sampled from the text was determined, count- 
ing each value of zero twice. Substantial agreement was obtained 
between the judged difficulty of a wide variety of reading selections, 
ranging from second grade to college level, and the ordering of the 
selections by the weighted mpdian index number. These findings en- 
abled Lively and Pressey to conclude that the weighted median index 
number provided an estimate of the vocabulary difficulty of texts. 
Lively and Pressey were aware of the weaknesses In their readabil- 
ity measure. They pointed out that their Index relies on the appro- 
priateness of Thomdike's word count, and this may not be optimal 
for their intended applications, since Thomdike's count appeared to 
them to be based largely on materials employing a literary and even 
poetic vocabulary. They also admitted that larger samples than one 
thousand words from an entire book may be more appropriate, al- 
though they made no effort to evaluate the effects of varying sample 
sizes. 

In addition to its purely historical significance, the readabil- 
ity assessment procedure of Lively and Pressey is significant because 
it led directly to the first of the "readability formulas" of the modern 
type^-that of \Vashburne and Vogel (Chall, 1958)--in which various fac- 
tors correlating with a selected criterion of readability were combined 
into a single multiple regression equation. 

Washburne and Vogel' s effort began as one of generating norms 
for use with the Lively and Pressey method, a need which was pointed 
out by Lively and Pressey when they presented their initial report. 
Working in tiie Winnetka, Illinois, school system, Washburne and Vogel 
determined the weighted median index numbers of 700 books reported 
as having been read and liked during the preceding year by at least 25 
of the 37, 000 students in the school system. Categorization scores 
comprising a weighted median index number were correlated with me- 
dian grade levels derived from the paragraph meaning section of the 
Stanford Achievement Test for those students who reported reading 
and liking a given book. It was found that a correlation of . 80 existed 
between grade level and number of "zero-value" words present in the 
tested sample from the corresponding book (Washburne & Vogel, 1926). 
This information was summarized in the Winnetka Graded Book List, 
which received considerable use by parents, teachers, and librarians 
in selecting books to be made available to children in grades 3 through 9 
(Chall, 1958). 
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A need was soon realized for a method of determining the ap- 
propriate grade level for books published after the Winnetka list was 
prepared, without repeating the massive effort employed in developing 
the original list. . Accordingly, Washburne and Vogel selected 150 of 
the books on the Winnetka list to isolate the internal factors related to 
reading difficulty; i. e. , factors which might be effective in distinguish- 
ing those books read by lower grade pupils from those read by students 
of higher grades. Ten factors were found, and their correlation with 
grade level was determined. All factors correlated significantly with 
the criterion of grade level, but only four were subsequently used in 
deriving a regression equation for measuring readability, since many 
of the other factors had very high intercorrelations (Vogel & Washburne, 
1928). 

The readability formula developed by Vogel and Washburne is 
based on a systematically chosen sample of 1000 words from the book • 
to be tested, which is analyzed as follows: 



number of different words appearing (x^) 
number of occurrences of prepositions (x^) 

o 

number of words not on Thomdike's list of 10, 000 (x^) 
number of simple sentences in 75 sample sentences (Xj.) 

The difficulty of the book is expressed in terms of grade levels of the 
Stanford Paragraph Meaning Test (X^). The regression equation de- 
rived was; 



X, = . 085x„ + . 101x^-»- . 604x, - .411x, + 17. 43. 
1 <!i o 4 5 

This formula correlated . 845 with the reading test scores of students 
who read and liked the respective books during the previous year. It 
is important to note that the criterion employed here is not strictly com- 
prehensibility of the text. It is confounded by subjective evaluation of the 
books by students and factors such as subject matter and number of pic- 
tures, which are not of direct interest to the question of readability. 



In addition to initiating the use of multiple regression equa- 
tions in the study of readability, Washbume and Vogel's study is sig- 
nificant because of their conceptual approach to the problem. They 
were the first to: (D analyze the effects of structural factors in the 
text, (2) emf)loy an objective criterion of textual difficulty as opposed 
to qualitative judgment, and (3) describe readability in terms of school 
grade levels, as opposed to purely relative measures (Chall, 1958). 

The readability measures of Lively and Pressey and of Vogel 
and Washbume are typical of the early studies of readability in that 
they: (1) measure readability essentially as a function of vocabulary 
factors, (2) depend heavily on Thorndike's word list, and (3) employ 
relatively crude criteria of difficulty of text. Other less significant 
early attempts to measure readability include those of Johnson (1930), 
Patty and Painter (1931), and Thorndike (1934) himself. 



Detailed Formulas 

The readability studies of Ojemann, Dale and Tyler, and Gray, 
and Leary are the most significant of the efforts undertaken between 1934 
and 1938, a period characterized by evaluation of larger numbers of 
factors potentially related to readability, reduced emphasis on word 
frequency lists, such as that of Thorndike, and concern over more ob- 
jective readability criteria. 

The year 1934 is considered to be the beginning of the trend 
toward detailed analyses of factors relating to readability, in which a 
large number of factors, including qualitative ones, were evaluated. 
The first of these to appear was that of Ojemann (1934), who instituted 
the use of comprehension test scores as the criterion of readability. 

Ojemann collected 16 parent-education passages, each approxi- 
mately 500 words in length. The passages were read by adults, and 
comprehension tests were given covering each passage. A reading achieve- 
ment test was also administered. The grade-level of difficulty of each 
passage was taken as the mean tested reading level of those people who 
correctly answered at least half of the comprehension test questions 
for that selection. After arranging the 16 selections in order of cJiffi- 
culty, Ojemann analyzed their contents for eight sentence factors, six 
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vocabulary factors, and three qualitative factors. All of the 14 quan- 
titative factors (sequence factors and vocabulary) correlated signifi- 
cantly with the criterion. 

Of the sentence factors, three exhibited correlations of . 60 
or above. These were: number of simple sentences, number of 
prepositions, and number of prepositions plus infinitives. The five 
sentence factors having correlations of less than . 6 with the criteri- 
on of difficulty were: number of complete sentences, number of com- 
pound sentences, number of dependent clauses, mean length of de- 
pendent clauses, and ratio of total words in independent clauses to 
the total words in a selection. 

All six vocabulary factors correlated . 60 or higher with the 
criterion. These were: percentage of words in Thorndike^s first 
1000, percentage of words in Thorndike's first 2000, percentage of 
words known by 70 per cent of sixth-grade pupils, percentage of 
words known by 90 per cent of sixth-grade pupils, mean difficulty 
of different words, based on Thorndike^s list, and mean difficulty 
for each word. 



No attempt was made to write a regression equation, since 
the qualitative factors, concreteness versus abstractness of rela- 
tions, obscurity in expression, and incoherence in expression, ap- 
peared to have considerable importance in determining the difficulty 
of the passages, although they could not be quantified. Instead of a 
formula, Ojemann presents his 16 tested selections, with their re- 
spective values for all the quantitative facto rt; and tests of grade -lev- 
el difficulty. This type of presentation was intended to allow evalua- 
tion of the qualitative factors in testing new passages, in addition to 
comparing new passages according to the quantitative factors. 
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In addition to being the first application of comprehension 
test scores as a criterion of reading difficulty, Ojemann's study 
was the first to employ adult subjects and adult reading materials. 
Furthermore, he was th!i first to demonstrate the importance of 
qualitative, nonstatistical factors in the determination of readabil- 
ity. 

Dale and Tyler (1934) evidenced vdry great concern over the 
range of applicability of then -available re.idability measures. Their 
interest was in defining thobe factors determining the difficulty of 
reading materials dealing with health education for adults of extreme- 
ly low reading ability. They stated that: 



A critical analysis of the widely varying results of previous studies indi- 
cates the imjxjssibility of determining the factors in the reading materials which 
make them understandable unless the investigations separate the influence of 
factors within the reading materials from those outside. The reader's Interest 
in the topic treated in the reading matter, his ability to read, the kind of com- 
prehension appropriate to the purposes of the reading matter, and the difficulty 
of the ideas developed in tba reading matter are all factors which greatly affect 
his comprehension of the raaterial read but are distinct from the characteristics 
involved in tlie materials tiiomsclves. . . the various factors not in the reading 
materials themselves must be controlled in order to determine the effects of 
factors within the materials (pp. 384-385). 



In order to isolate the factors which affect reading difficulty, 
given a fixed topic, a fixed readership, and a fixed level of compre- 
hension. Dale ana Tylor asked adults of low reading ability to read 
health education articles collected from newspapers, magazines, and 
books. The readers then completed a multiple-choice compreh<?nsion 
test designed to measure their understanding and retention of tht main 
ideas of the selections. 

The type of comprehension test employed was one o'f the most dif- 
ficult that could have been used. Questions did not deal with particulai* 
items of information which could be remembered with rt^lative east 
by the subject. Rather, the subject was required to integrate the en- 
tire selection and determine its main point. Additionally, the test 
questions were written so thai, of the five possible responses to each, 
one option was "best" and the others were characterized by varying de- 
grees of correctness; i. e. , accurate statements of secondary points 
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of the article or slightly inaccurate statements of the points in the 
article, requiring the subject to know what was covered in the selec- 
tion, and also what was not covered. Good performance on such a 
test must have been a formidable task for adults determined to be 
reading at the third to fifth grade level--the level of these subjects. 
Not surprisingly, adequate comprehension was not obtained on any of 
the collected reading materials for determining the correlation between 
comprehension and the 29 quantitative factors. 

To obtain sufficient comprehension. Dale and Tyler found it nec- 
essary to write their own, much simplified, test passages. Three prin- 
ciples were found which produced articles with the required ease of read 
ing: (1) use of very basic vocabulary, (2) use of informal style charac- 
terized by conversational manner and anecdotal examples, and (3) free- 
dom from digression from the topic of interest. 

After rewriting, rescoring, and readministering their materials. 
Dale and Tyler found that of the 29 factors studied, 10 correlated signi- 
ficantly with comprehension. Of these, three were included in a regres- 
sion equation for predicting comprehension; i. e. , technical words in the 
passages (x2), the number of nontechnical words not known to 90 per 
cent of aixth grade pupils (X3) [from an unpublished study by Dale], and 
the number of indeterminate clauses (X4). 

The equation predicted the percentage of adults of reading lev- 
els of grade 3 to grade 5 who could understand the main point of a pas- 
sage (Xj) based on the above measures, and took the form 

= -9.4x2-0.4x3+2.2x^+114.4. 

Predicted comprehension correlated ,51 with the criterion. 

A comprehensive study of factors influencing readability was 
performed by Gray and Leary (1935). They listed 82 factors relating 
to readability as drawn from a study of: (1) adults' and children's read- 
ing materials, and (2) recommendations of adult students, teachers, 
and professors of English. Of the 82 factors, 18, such as "image bear- 
ing words, " "physical and psychic association, " etc., were discarded 
because they seemed to "defy objective measurement. " Twenty more 
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factors were discarded because of their infrequency of occurrence 
in the' sampled materials. Twenty of the remaining 44 factors cor- 
related significantly (r > .27) with comprehension test scores of a 
sample of 756 adult subjects who were given general adult reading 
materials, both fiction and nonfiction. The subjects were selected 
in such a way as to be representative of the then current population 
of adult readers. 

When Gray and Leary separated out their upper quartile ("good" 
readers) and their lower quartile ("poor" readers), the factors corre- 
lated differently (by group) with the comprehension test scores. It was 
found that vocabulary measures were most highly correlated with com- 
prehension scores for poor readers (those achieving comprehension 
scores in lowest quartile), while readers scoring in highest quartile 
showed comprehension scores to be most closely related to sentence 
structure and length. 

Gray and Leary provided an entire family of regression equations 
relating quantifiable factors to predicted comprehension test scores. 
For low ability readers, as defined by the comprehension test admin- 
istered, their initial equation included eight factors, and correlated 
64 with their criterion. Further analysis showed that considerable 
reduction in the number of factors included in the equation resulted in 
minimal change in the multiple correlation with the criterion measure, 
or in the probable error of the estimate. Nine formulas were presented 
employing various sets of four of the original eight factors. Multiple 
correlations computed for these formulas varied from .6350 to .6402, 
and the probable errors of the formulas varied from . 2956 to . 2975. 
These findings allowed Gray and Leary to suggest th-^t in employing 
their measure of readability, one of the four -factor formulae be em- 
ployed and that the sole criterion for choosing between them be the 
ease of measuring the factors required in each formula. 

However, the most popular application of Gray and Leai'y's 
results has been based ou their five-factor formula, which demonstrates 
a correlation of . 64 with the criterion, and the lowest probable error 
of aU (by . 0004). This formula takes the form: 



= .01029x2 'OOSOl^Xg - .02094Xg - .03313x, 
-. 01485x„ + 3, 774 



where: 
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a predicted average comprehension score, along 
a scale of +4 to -4 

^2 " average number of hard words (words not appear- 
ing on Dale list of 769 easy words) appearing in 
• samples* 

Xg = average number of 1st, 2nd, and 3rd person pro- 
nouns appearing in sample 

Xg = average sentence length in words 

x^ « average percentage of different words in sample 

Xg « average number of prepositional phrases appear- 
ing in sample 



Such an equation obviously lacks broad utility since it is only applic- 
able to low ability readers— the sample on which it is based. 

Three hundred fifty books were tested using the five factor 
formula. The distribution of predicted diffi-ulty of the books was very 
nearly normal. Five categories of difficulty were defined and labeled 
"A ("very easy") through "E" ("very difficult"). Predicted compre- 
hension scores in the "very easy" category ranged from 1. 46 to 1. 15. 
and the "very difficult" scores included values from 0. 22 to -0. 09. 
Correlation of indiv^ luals' comprehension scores with their respective 
tested reading grade levels indicated that category A materials were 
written at approximately the 2nd to 3rd grade levels, category B was 
at roughly the 4th grade level, and category C at the 6th grade to Junior 
High School level. The two highest difficulty categories did not corre- 
late reliably with reading grade level. 



*a 100 word passage from each chapter analyzed. 
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other narrower studies of this -period are included in the sum- 
mary table in Appendix A and are briefly discussed by Klare (1963). 



Less Cumbersome Formulas 



The next trend in readability research, that toward more effici- 
ent and easily applied formulas, saw the development of the two most 
popular and widely applied readability measures- -the Flesch and Dale- 
Chall formulas. 

The appropriateness of searching for efficient formulas was 
demonstrated by Lorge (1939). He recomputed correlations and re- 
gression equations based on the structural variables contained in the 
five factor formula of Gray and Leary. Lorge used as criterion ma- 
terials the reading passages of the Standard Test Lessons in Reading 
of McCall and Crabbs. This is a set of 376 reading passages normed 
in terms of number of comprehension test questions correctly answer- 
ed, and related to grade levels of the Thomdike-McCall Reading Scale. 
Since its initial application by Lorge, this has become the most fre- 
quently used and highly respected criterion in studies of reliability 
(Klare, 1963). 

Using the variables of Gray and Leary, Lorge obtained higher 
correlations with his criterion than had Gray and Leary with any of 
their formulas. He produced two regression equations, each based on 
only two of the Gray-Leary variables, X2 and xg, and X2 and xg (in 
their notation). Multiple correlations of . 7406 and . 7456, respective- . 
ly. were reported. This is the first instance in which over one-half of ^ 
the variance in a pure measure of comprehension has been accounted for 
by a readability formula. Lorge attributed his high multiple correlations 
completely to his use of more adequately standardized criterion materi- 
als and his use of a larger sample of criterion materials- -376 passages, 
as opposed to 48 for Gray and Leary. 

After study of additional variables, all of which were measures 
of aspects of vocabulary, Lorge concluded that other variables add in- 
significantly to the predictive accuracy attainable from vocabulary alone. 
He suggested that this finding may be attributable to insufficient reliabil- 
ity of available criteria, including that of McCall and Crabbs. But, he 
contended that improved criterion measures would not negate his thesis 
that vocabulary is the most important determinant of reading compre- 
hension. 
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Lorge did not present his own readability formula until 1944 
(Lorge, 1944), His formulation included two of the Gray-Leary fac- 
tors, xg (average sentence length in words) and X8 (number of prepo- 
sitional phrases per hundred words), as well as Xg (ratio of hard words 
to total words in the sample), Lorge categorized these as a sentence 
structure factor, idea density factor, and vocabulary factor, respec- 
tively, and considers them the primary structural elements relative to 
readability, A "hard word, " to Lorge, was one that does not appear on 
the Dale list of 769 easy words. This list includes wovds common to 
Thorndike's first thousand most frequent words and the first thousand 
most frequent words known to children entering the first grade, from 
the International Kindergarten List (Gray & Leary, 1935). 

Reanalysis of Lorge's data by Dale uncovered certain compu- 
tational errors, and Lorge's formiila was subsequently recomputed in 
1948 (Lorge, 1948). The final formula is; 

« .lOXg + , 06xg + . lOXg + 1, 99 

where C^^ is the reading grade level of individuals answering one-half 
of the McCall-Crabbs questions on that passage correctly. The ob- 
tained predictor-criterion correlation was , 67, and addition of other 
variables raised the correlation to .705, 

Lorge is the first to emphasize that readability formulas in 
general are nottobe construed as prescriptions for writing, but are 
only usable as an approximate of the reading difficulty of a passage, 

Rudolf Flesch was strongly dissatisfied with the then available 
measures of readability as they could be applied to adult reading ma- 
terials (Flesch, 1943a), Readability formulas, when applied to adult 
literature, such as magazines, tended to rank the literature in orders 
far different from the ordering that was produced by judgment or know- 
ledge of the educational levels of readers of each magazine, Flesch 
thought that this mis ranking, was due to excessive emphasis on range 
of vocabulary and insufficient stress on other factors. 
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Flesch hypothesized three types of factors that should highly 
influence readability for adults. First, he suggested that sentence 
length should be an important correlate of adult reading difficulty, 
although it did not appear to be for children. He found support for 
this idea in the work of Gray and Leary (1935), who found sentence 
length to be an important variable associated with readability for 
good readers only. Second, Flesch hypothesized that abstractness was 
correlated with readability for adults. Third, he thought that the read- 
er's interest in the topic of a reading passage should influence its read- 
ability. 

Based on Lorge's data employing the McCall-Crabbs Standard 
Test Lessons in Reading, Flesch developed his initial formula, in 
which adult reading material would be assigned to a wide range of 
school grade levels on the basis of: (1) average sentence length, (2) 
number of affixed morphemes (affixes and suffixes, with certain ex- 
ceptions), an index of abstractness, and (3) number of words of per- 
sonal reference (pronouns, names, words directly relating to people, 
such as aunt, baby, etc. )(Flesch, 1943b). His regression weights 
were subsequently recomputed by Lorge when computational errori^ 
were found by Dale (Lorge, 1948). 

Almost simultaneously, due to difficulty of application, ap- 
parent lack of sensiti\?ity to the human interest factor, and misuse of 
the formula as a rule for writing, Flesch completely revised his formu- 
la and published his ^'reading ease'* and "human interest" formulas 
(Flesch, 1948). The two new formulas were again standardized against 
the McC'Il-Crabbs lessons. The "reading ease" formula appeared as 
R.E. = 206. 835 - . 846 WL - 1.015 SL, in which SL is average sample 
sentence length in words and WL is word length measured as syllables 
per 100 words. Syllable count was JLubstituted for the earlier count of 
affixes. The correlation between the two measures was . 87, and the 
syllable count was expected to be more easily and reliably taken. 



where PW is average percentage uf personal words, using a slightly 



The "human interest" index is presented as: 



H.I. = 3.635PW + .314PS 
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narrower definition than previously, and PS is the percentage of 
sentences spoken or addressed to the reader, including exclama- 
tions, or grammatically incomplete sentences whose meaning must 
be determined from the context. This factor tests the *^conversa- 
tional quality'^ of the passage, and its inclusion represents an attempt 
to bring out the easy readability of direct conversational style. The 
reading ease formula correlated . 70 with the McCall-Crabbs; the hu- 
man interest index correlated . 43 with the McCall-Crabbs. 

The new formulas were held by Flesch to locate tested pi\ssages 
on scales which range from 0 to 100. A reading ease score of zero is 
considered to repi'csent ^'practically unreadable" material, while 100 
represents text which is easily read by any literate person. A human 
interest score of zero indicates no human interest, and 100 indicates 
that the passage is "full of human interest, " 

The Flesch formulas have become the most widely applied in 
the entire history of readability research. This wide application is 
due in part to the ease of computation of his formulas and partly to 
the wide exposure given to his formulas through a long series of popu- 
larized books (e.g.. The Art of Plain Talk fFlesch, 1946], The Art of 
Readable Writing [Flesch, 1949], and How to Test Readability [Flesch, 
1951]). These books saw wide circulation in business, governmental, 
and journalirtic circles, where they were employed as rules for writ- 
ing (Chall, 1958). 

A simplification of the Flesch reading ease formula was pro- 
posed by Farr, Jenkins, and Patterson (1951), who observed a corre- 
lation of 91 between the average number of syllables per word and 
the number of one syllable words in a passage. Accordingly, they 
suggested that the number of one syllabic words be used in Flesch' s 
formula so as to ^.ermit more rapid testing with no loss oi reliability. 
Farr, Jenkins, and Patterson generated such a formula, and found it to 
correlate . 95 with scores produced by the Flesch formula. They there- 
fore concluded that their formula, 

Reading Ease = 1.599 (number of one syllable words) 

- 1. 015 (mean sentence length in words) - 31. 517 

should be considered an acceptable substitute for the Flesch formula. 
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Both Flesch (1952) and Klare (1952) immediately criticized 
this proposal. Flesch maintained that accuracy would be lost in eval- 
uating very easy or very difficult materials, and Klare suggested that 
the reliability of counting one syllable words should be lower than that 
of counting syllables in the manner suggested by Flesch. England, 
Thomas, and Patterson (1953) experimentally tested these criticisms 
and were able to discount theni completely. The Farr, Jenkins, and 
Patterson formula has since received a moderate amount of accept- 
ance- -considering the rate at which it is reported as being used in 
applied evaluations of materials. Chall (1958) pointed out the irony 
of this return to word length as a criterion, of difficulty, one of the 
very factors Flesch considered unsuitable. 

Dale and Chall (1948) employed the Flesch formula to evaluate 
educational materials published by the National Tuberculosis Associa- 
tion in terms of readability for the average adult. They eventually 
sought to develop their own measure, because of two drawbacks en- 
countered in the use of the Flesch formula. First, they found low be- 
tween rater reliability for the count of affixes necessary for the Flesch 
formula. Although Dale and Chall expressed considerable respect for 
the justification, presented by Flesch (1943b), for using a count of af- 
fixes as an index of abstractness, they wondered (since affixes corre- 
lated . 78 with abstractness) whether or not some other and more man- 
ageable correlate of abstractness could be employed. This question is 
supported by Lorge (1944), who stated that all measures of vocabulary 
load, including abstractness, are intercor related. Second, Dale and 
Chall considered the use of personal references in the Flesch human 
interest formula to be oversimplified. References to senators, though 
personal, in the Flesch sense, are not generally associated with a low- 
ering of abstractness of text, as are references to "Dad" or "John, " 
which are generally associated with a very concrete type of statement. 

Again using the McCall-Crabbs test data originally collected 
by Lorge, Dale and Chall derived a new regression equation. This 
equation is based on average sentence length, and the "Dale score" 
--the relative number of words in the text samples not appearing on 
a list of 3000 words known to 80 per cent of a sample of fourth grad- 
ers. The form of the equation was: 

^C50 • ^^^^ (Dale score) + . 0496 (sentence length) + 3. 5365 

in which the criterion, ^(--gQ' is the reading grade level of an individual 
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able to answer correctly half of the comprehension test questions. 
The score yielded by this equation correlated , 70 with the McCall- 
Crabbs criterion. 

Another formula based on the Klesch formula is that of 
Gunning (1952), In this formula, reading grade level necessary to 
understand tested material is equfi to . 4 of the sum of mean sen- 
tence length in words and percentage of words of three or more syl- 
Xahlos, No numerical correlation was rt^portod between this and oth- 
er estimates of reading difficulty, although there is little reason to 
believe that the correlation was low. 

The McCall-Crabbs Standard Test Lessons in Heading, which 
were the basis of the last five formulas discussed, were developed in 
1926, They were revised in 1950, and at least 60 of the passages in 
this more recent edition are entirely new, dealing with modern topics, 
such as. atomic energy and aviation (Powers, Sumner, &Kearl, 19&B), 
Based on this revision of tht. McCall-Crabbs tests. Powers, Sumner, 
and Kearlundertookto recompute the Flesch, Dale-Cball, Farr-Jenkins- 
Pattcrson, and Gunning indices of readability. They hoped to produce 
formulas accounting for changes in reading abilities over the twenty- 
four year period between test editions, and to facilitate comparison of 
the formulas, since they would be based on identical measurement rules» 
mathematical procedures, etc. 

The recomputed formulas and their respective correlations with 
the constant criterion of grade score of pupils answering one-half of the 
test questions cori'ectly are: 



Flesch: 



-2, 2029 + („0778)(mean sentence length) + (, 0455) 
(number of syllables per 100 words) 



= ,6351 



Dale-Chali: 



3,2672 + (,0596)(mean sentence length} + (, 1155) 
(percentage of words not appearing o^ . )alc list of 
3000) 



r a ,7135 



Farr- Jenkins - Pattersom 

8. 4355 + (. 0923)(mean sentence length) 

- (. 0648) (percentage of one syllable words) r « . 5836 

Gunning! 

(. 0984Upercentage of words of three or more 

syllables) r = .5865 



In support of their recalculations. Powers, Sumner, and Kearl 
pointed out that the revised formulas give estimates more consistent 
with one another than did the original Flesch and Dale-Chall formulas; 
the other formulas were not compared in their original forms. It was 
concluded that the Dale-Chall formula is the "best, " since it had the 
highest predictive power (correlation) and the lowest standard error, 
. 77 grade levels, of the four formulas tested. 

The final readability measure of this period to be discussed is 
that of McElroy. This measure was unfortunately called a "fog count, " 
the same term applied by Gunning to his formula. This duplication of 
names has caused confusion on the part of some later researchers, 
such as Kincaid (1972). 

McElroy' s fog count is, again, related to the Flesch formula 
in that it is based upon a count of syllables (Klare, 1963). The pro- 
cedure to be followed in using this formula is assign a value of 1 to 
each word of one or two syllables appearing in the sampled passage, 
and a value of 3 to each remaining word --which will have three or 
more syllables. The assigned values are added and the value thus de- 
termined is the fog count, '^o determine the reading grade level as- 
sociated with a particular fog count value, if the sum is over 20, it is 
divided by 2. If the sum is under 20, 2 is subtracted and the result 
is divided by 2. 

No statistical information relating to the development, or ac- 
curacy of this formula is available (Klare, 1963; Kincaid, 1972). 
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Ho went Formulas 



Klare (1963) considers the period 1953 to 1959, when his re- 
view was written, to be one exemplified by a trend toward speciali- 
zation in readability formulas. However, it seems that there is little 
to be gained in naming a trend on the basis of only seven formulas of 
relatively minor importance which appeared in a six year period. Ex- 
tension of the period to include all recent studies of readability meas- 
urement techniques appearing from 1953 to the present does, however, 
appear appropriate and renders impossible the naming of a trend of 
study characterizing this recent period. 

During this period, in addition to formulas intended for a limited 
.range of application, additional readability measures intended for gen- 
eral a{5plication were developed, and two new approaches to the problem 
of measuring readability were presented. 

During the last 18 years, four formulas appeared which are in- 
tended for application to primary school texts. These are the formu- 
las of Spachi (1953), Wheeler and Smith (1954), Bloomer (1959), and 
Tribe (1956). 

The formula of Spache (1953) predicts the primary grade level 
(grades 1-3) of textual material on the basis of average sentence length 
in words (xi) and percentage of words not appearing on the Dale list of 
769 common words (X2). The formula predicts grade level as equal to: 
. 141x1 • 086x2 • 8^^9» The reported correlation between predicted 
score of tested materials and usual grade level of application was . 818. 

Wheeler and Smith (1954) based their formula for prediction of 
grade level on the grade designation assigned by the publisher of the 
textual materials in their sample. Their formula equated grade lev- 
els to 10 times the product of the mean length of units (sentences, 
with minor exceptions) in words and the percentage of multisyllable 
words. The value thus determined is located in a table and grade lev- 
el of 1 through 4 i-ead ofi. This was the first formula to be based on 
a multiplicative model of factors. This type of equation allows inter- 
actions between the factors (as discussed by McLaughlin [1969]), so 
that a particular change in factor A will affect the predicted value dif- 
ferently at varying levels of factor B. This may be a highly important 
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characteristic for a readability formula, in the light of data such as 
that of Gray and Leary (1935), showing that the relative importance 
of vocabulary and structural factors does not remain constant across 
all reading levels. Multiplicative formulas may be able to deal more 
appropriately with findings- of this type than do additive formulas. 
But, it is doubtful that the simplified formula of Wheeler and Smith 
is a very large step in the proper direction, 

TThe formula of Bloomer (1959) again used customary grade 
level application as the readability criterion. In Bloomer »s develop- 
ment, reading grade level is predicted from abstraction level as indi- 
cated by the number of words per modifier (modifying phrase) and 
"sound complexity" (sic) of modifiers. Bloomer contended that these 
variables may be used as predictors of readability, although he pre- 
sented no formula, predictive method, or method for application. 
His approach is based on the observation that abstraction in text in- 
creases with grade level, and that the two variables (words per modi- 
fier and sound complexity) employed are closely associated with ab- 
straction. The multiple correlation between the two variables and 
assigned grade level was , 78, which Bloomer considers to compare 
favorably with the correlations obtained through the procedures of 
Flesch and of Lorge, 

The McCall-Crabbs Standard Test Lessons in Reading, 1950 
revision, were again used as a criterion by Tribe (1956), Grade level 
score of children, m grades 2 through 8, who could correctly answer 
one-half of the reading test questions was found to be equal to: 

, 07 19xi + , 1043x5 + 2, 9347 

where is the average sentence length and Xg is the percentage 
(times 100) of words not appearing on the Rinsland word list, A cor- 
rection factor is then applied to the pr edicted grade level. This fac- 
tor is based on a table presented by Dale and Chall (1948), 
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An unusual criterion was employed by Jacobson (1965) to 
develop formulas to determine the readability levels of high school 
and college chemistry and physics texts. The predicted criterion 
measure was the average numbjer of words indicated as not being 
understood (as indicated by reader underlining) by readers, based 
on 200 word sample passages. This procedure was first employed 
in 1928, and is known as the Kyte test, Test-retest reliability of 
this test over a one week interim was reported by Jacobson to be 
,95 for the physics texts and ,85 for the chemistry texts in his 
sample. Two formulas are presented --one for physics texts and 
one for chemistry texts. The variables included are: 



X mean underlining score (the criterion) 

X- mean number of words per independent clause 
in each 200 word sample passage 

Xg mean number of mathematical terms in the 
sampled passages 

.x« mean number of words in the sampled passages 

that are above the 6,000 word level in Thorndike's 
20, 000 word list 



X. * mean number of (technical) words per passage 
which do not appear in the Powers list of 1828 
essential scientific terms 



The formula derived for use with physics texts is: 
X = -,0003 + 29,7059x2 ^^^^3 35,0029x4 



which exhibited a correlation with the criterion of ,70, 




The formula for chemistry texts is: 

X « .003 + . 1706X- + 13. 7231x„ - 43.7262x„ - 2.3577x 



Predictions from this formula correlated . 67 with the criterion. 

Flesch introduced three additional readability indexes (the term 
"formula" is probably inappropriate here) in 1954 and 1958 (Flesch, 1954, 
1958). All are relatively subjective measures in which counts are nnade, 
based on 100 word samples of text, and the counted values are converted 
to arbitrary scales through reference to conversion tables. All were 
validated by inspection only. 

The "r" score (Flesch, 1954) is an index of realism, based on 
the number of references to specific human beings, their attributes or 
possessions, locations, objects numbered or named, dates, times, 
and colors. 

The "e" score (B'lesch, 1954) is an index of energy, based on 
indications of voice communication, such as inflection. 

The "formality-popularity" scale (Flesch, 1958) is based on 
the total numbers of: capitalized, underlined or italicized words, num- 
bers (not spelled out), punctuation marks, symbols (#, $, (5, etc.), be- 
ginnings of paragraphs, and endings of paragraphs. 

Forbes developed a readability measure intended for application 
to the instructions and items contained in all types of standardized tests 
and opinion polls, except vocabulary tests (Forbes & Cottle, 1953). To 
determine a test's reading grade level, each word appearing difficult 
to the grader is looked up in the 1942 Thorndike Junior Century Dic- 
tionary , and its listed frequency of occurrence, from the first to the 
20, 000th words noted. The total of all indices above 4 is computed 
and divided by the total number of words sampled. This vocabulary 
index is looked up in a table, which indicates the corresponding gr.idc 
level. 
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They reported that the readability of tests measured using 
this procedure correlates . 96 with the average of the readability of 
the same material calculated using five other procedures, including 
the Dale-Chall and Flesch methods. According to Forbes and Cottle, 
at the time of this work, no readability measure directly applicable to 
test forms was available. However, in view of the reported high cor- 
relation (• 96) between this technique and the others, it seems that the 
contribution is minimal. Additionally, if the other measurement pro- 
cedures are not appropriate to this application, what is the value of the 
Forbes and Cottle contribution? No other method of assessing the read 
ability of the test materials is reported. 

Six new readability measures of the traditional form have ap- 
peared in very recent years. The first of these was that of Coleman, 
developed in 1965 and discussed by Szalay (1965). Coleman developed 
a family of four formulas, using one through four measured variables, 
respectively, to predict readability level. The readability criterion 
in this case was the mean '*cloze'* score on the passage achieved by 
a sample of college students. (The **cloze'* score is the percentage 
of deleted words of a passage that are correctly guessed and written 
in by a subject. This approach to readability is discussed in subse- 
quent paragraphs* ) 

Correlations among the four formulas and criterion scores 
varied between • 85 and . 91 when tested independently by Szalay and . 
Coleman (Szalay, 1965). It is therefore recommended by Szalay that 
only the simplest formula be employed: 

Predicted cloze score = 1.29 (percentage of one syllable words) 

- 38. 45. 

Factors present in the other three formulas but not making significant 
additions to predictive ability are: (1) sentence length, (2) frequency 
of occurrence of pronouns, and (3) frequency of occurrence of prepo- 
sitions. 
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A very similar formula was developed by HuMRRO personnel 
for measuring the readability of Army technical literature (Caylor, 
Sticht, Fox, & Ford, 1972), Their measure is called the FORCAST 
formiila after its developers, FORd , CAy lor, and STicht. They con- 
sidered existing formulas inappropriate for their purpose because 
school students and school or general texts had been most often em- 
ployed in developing readability formulas. This type of standardiza- 
tion was believed to make the applicability of prior formulas to tech- 
nical publications for adults suspect. Moreover, application of many 
of the existing formulas required special grammatical or linguistic 
^competence on the part of the person attempting to apply the formulas. 
Ford, Caylor, and Sticht elected to use cloze score as the criterion of 
readability. They believed the cloze test to be more objective than 
multiple choice tests or the other more traditional indices of com- 
prehensiOi". They also pointed out that cloze has "consistently yielded 
very high correlations with multiple choice tests and other more sub- 
jectively constructed measures of comprehension and difficulty" 
(Caylor, et al, , 1972, p, 12), 

Additionally, as part of their own work, they found a corre- 
lation of approximately , 80 between cloze score on 150-word passages 
chosen from the readings required in a wide range of Army jobs and 
achieved reading grade level, as measured by the United States Armed 
Forces Institute Reading Achievement Test III, Form A, Abbreviated 
Edition, 

Previous research had indicated that if a cloze score of 35 per 
cent is achieved by a given subject on a particular test passage, then 
it may be reasonably expected that the subject will correctly answer 
approximately 70 per cent of a set of multiple -choice questions based 
on that passage. Hence, cloze score appears to be a good indicator of 
both comprehension and reading achievement level, 

A cloze score of 35 per cent was arbitrarily^ chosen as the cri- 
terion of potentially adequate comprehension of a text pafssage. The 
reading grade level of a passage was defined as the lowest reading 
grade level (as determined by USAFI Reading Achievement Test score) 
at which 50 per cent of the tested subjects achieved a cloze score of 
35 per cent or higher on that passage. 
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A literature search provided Ford, Caylor, and Stichtwith 
a list of 15 structural properties of text that had been applied in 
previous readability formulas and required no special competence or 
equipment to measure. Correlations between cloze score and each of 
the structural properties were computed and several regression equa- 
tions were derived. Their preferred formula employed only a single 
factor, number of one -syllable words per passage. This factor is 
very easily measured, and basing the equation on additional factors 
allowed no practical increase in predictive power. 

The FORCAST formula predicts reading grade level as equal to: 

20 - (number of one-syllable words/ 10) 

The correlation between predicted reading grade level of a passage with 
tested reading grade level associated with 35 per cent cloze score was 
. 87. A subsequent application using new test passages and new sub- 
jects produced a correlation of .77. 

The two last discussed formulas are extremely simple in form. 
However, the inclusion of additional factors in the formulas allowed in- 
consequential gains in predictive power, while greatly increasing the 
effort required in applying the formulas. In contrast, the developers 
of the remaining four readability measures of recent years set out 
with the stated purpose of attempting to provide a method of measur- 
ing readability with minimal effort. 

Smith and Senter (1966) developed a readability equation whose 
data may be collected from mechanical counters easily installed on an 
IBM Selectric typewriter. This technique allows measurement of read- 
ability at essentially rough draft typing speed. Mechanical counters 
are used to record the numbers of koy strokes, blank spaces, and 
sentences (an equal sign must be typed at the end of each sentence; the 
number of activations of this key indicating the number of sentences 
typed). From these counts, the mean number of words per sentence 
[number of spaces divided by number of sentences (w/ s)] and the mean 
length of words (number of strokes aivided by number of spaces (s/w)] 
may be computed. Based on examination of graded school texts, the 
regression equation predicting grac'e level (GL) from the above ratios 
is; 



GL =0,50 (w/s) + 4.71 (s/w) - 21,43 
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This may be simplified to yield the arbitrarily scaled Automated 
Readability Index (ARI) equal to (w/s)+ 9 (s/w). The authors sup- 
port use of the ARI instead of predicted grade level because consider- 
able variability exists in characteristics of texts written for school 
students beyond the junion high school level. Accordingly, a precise 
statement of grade equivalent appears inappropriate. It is also pointed 
out that readability, as measured by a formula such as this, increases 
more slowly with grade level at high levels than at low ones. 

Dismissal of prediction of grade level removes the need to 
complicate the formula by attempting to deal with this nonlinearity. 

As advantages of this procedure of estimating readability. 
Smith and Kincaid (1970) pointed out: (1) the speed of data collection 
that is possible, (2) the concrete nature of the acquired data, making 
its collection extremely objective and reliable, and (3) the ease with 
which it could be incorporated into modern computerized typesetting 
machinery. 

Coke and Rothkopf (1970) have adapted the Flesch readability 
formula for automatic computation by computer. The determination 
of words per sentence is straightforward in their algorithm, but word 
length in syllables is indexed by number of vowels per word. Using 
these two variables, their program produced Flesch reading ease 
scores that exhibited a correlation of . 92 with scores calculated using 
the normal procedure. They discount the practical utility of their 
program, but indicate that it is useful for testing the adequacy of text 
sampling procedures. They present a graph showing the probability 
of miscalculating reading ease score by five points or more as a func- 
tion of sample size. 

F ry (1968) presented his readability measure as a simple graph 
on which grade level may be looked up, given average sentence length 
in words and syllables. With this presentation, he hoped to reduce 
greatly the amount of time required to compute an index of readabil- 
ity, thereby increasing the popularity of such a measure. His grade 
level designations were based on inspection of textbooks used at vari- 
ous grade levels. The graphic presentation employed permits the 
readabij.ity measure to reflect accurately nonlinearities in the grade 
level function without resorting to higher order equations or arbitrary 
scaling. 
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McLaughlin, a psycholinguist, developed a readability formu- 
la which is based on the 1961 revision of the McCall-Crabbs Test 
Lessons. His formula is based on a multiplicative rather than addi- 
tive model of factors (McLaughlin, 1969). McLaughlin feels that word 
length, a measure of semantic difficulty, and sentence length, a meas- 
ure of syntactic difficulty, interact with reading difficulty in a manner 
that cannot be accounted for by additive formulas. Interestingly enough, 
McLaughlin found that his readability formula, based on a multiplica- 
tive model, could be computed more easily than any previously existing 
measure. He first points out that the step of multiplication of word 
and sentence lengths may be avoided, since a count of syllables in a 
set number of sentences is an equivalent procedure. He next points 
out that even counting all the syllables in N sentences is unnecessary. 
He found that the number of syllables in 100 words is equal to three 
times the number of words of over two syllables, plus 112. He then 
derived his regression equation, and found that by adjusting tested 
sample size the formula for predicting grade level necessary to an- 
swer 100 per cent of the McCall-Crabbs questions correctly could be 
greatly simplified. A very close approximation to his formula is pre- 
sented: 



SMOG Grade = + square root of polysyllable count - 
in 30 sentences 



Predictions from this formula correlated at approximately . 70 with 
the criterion. The term SMOG grade, or SMOG count, is in tribute 
to the F(Xi count of Gunning, the first application of the count of num- 
ber of polysyllabic words to readability determination and to the charac 
teristic atmospheric condition of his home city — London. 



'•'number of words of three or more syllables. 



Cloze Procedures 



The cloze procedure, which has been briefly mentioned pre- 
viously, was introduced by Taylor (1953) as a measure of readability 
that is free from many of the disadvantages of traditional readability 
measures. In the cloze method^ samples of text are presented with 
some words deleted and replaced by blank spaces. The subject's taF-k 
is to fill in the blank spaces with ihe correct words. The name "cloze" 
was applied to this procedure by Taylor beca^use of its resemblance to 
the principle of closure of the Gestalt school of psychology: the "human 
tendency to complete a familiar but not -quite finished pattern- -to 'see' 
a broken circle as a whole one. . .by mentally closing the gaps. " 

In his initial report, Taylor presented data showing that the 
cloze procedure consistently ranked tested "standard" reading passages 
in the same order as the readability formulas of Flesch and of Dale- 
Chall. He also indicated that the rank ordering of cloze scores is main- 
tained regardless of system of word deletion employed, be it every nth 
word or random, with low (10 per cent) or high (20 per cent) rates of 
word deletion, A random or, equivalently, every nth deletion system 
is strongly defended. If enough words are deleted, all kinds of words 
are deleted in the proportion in which they actually occur in the text. 

Analysis of the effects of scoring for only precise matches 
with the deleted word as compared with applying the more tedious pro- 
cedure of accepting synonyms as correct responses raises all scores 
equally. Accordingly, there is no effect on discriminability and the 
more difficult procedure is not warranted. 

Taylor also demonstrated that the cloze test can handle unusual 
materials more effectively than readability formulas, Gertrude Stein, 
for example, writes in quite short sentences with a fairly simple vo- 
cabulary. However, her style is such that her materir.l is very dif- 
ficult to read. This is accurately reflected by the cloze test, but the 
Flesch and Dale-Chall formulas rate sample passages taken from her 
writings as very easy reading. 
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Taylor (1957) found correlations of . 70 to . 80 between cloze 
scores and comprehension scores of Air Force trainees reading 
typical Air Force technical material. Bormuth (1968) found cor- 
relations of . 90 to . 96 between cloze scores and scores on tests 
of comprehension of passages from the Gray Oral Reading Tests. 
Bormuth' s questions were of the transformational type, measuring 
retention of "facts" from the passages only. In constructing ques- 
tions of this type, one word or clause of a statement in the passage 
is deleted and replaced by a question marker. The answer to the 
question, then, is the deleted element. For example, "The boy rode 
the horse, " becomes "Who rode the horse?" Bormuth indicated that 
limiting the test to questions of this type circumvented the problem 
of poor matching of the readability levels of the passages and the tests, 
since the questions are determined by the sentences of the passage. 

Rankin and Culhane (1969) similarly correlated cloze scores and 
comprehension scores, but without limiting the types of questions in- 
cluded in the comprehension tests. Their test questions included items 
relating to vocabulary, fact, sequence, causal relationship, main idea, 
inference from facts, and author's purpose. They obtained a correla- 
tion of . 68 between cloze scores based on excerpts from encyclopedia 
articles and comprehension test score. 

Bormuth (1967) determined the cloze scores corresponding to: 
(1) 75 per cent comprehension test score, the comprehension level 
generally considered necessary to allow effective classroom study of 
a text with assistance of a teacher available, and (2) 90 per cent com- 
prehension test score, the level which is considered an indication that 
the text is sufficiently comprehensible to allow effective independent 
study. A 30 per cent cloze score was found to be associated wifh 75 
per cent comprehension test score, and 50 per cent cloze score was 
associated with 90 per cent comprehension score. Replication of the 
study in the succeeding year (Bormuth, 1968) showed a cloze score 
of 44 per cent to be associated with the "classroom level" and 57 per 
cent to be associated with the "independent level. " Similar proce- 
dures by Rankin and Culhane (1969) using their more difficult type of 
comprehension test,- as described previously, found cloze scores of 41 
and 61 per cent at the comprehension score points of interest. The cor- 
respondence between these results and those of Bormuth is quite re- 
markable in view of the fact that Bormuth considered his 1967 results 
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relating cloze score to 90 per cent comprehension score relatively 
invalid, because of ceiling effects present in the comprehension 
test used in that study. 

Based on these results, Rankin and Culhane concluded that 
it would be appropriate for teachers to consider books on which pu- 
pils cannot attain a cloze score of approximately 40 per cent to be 
too difficult for those students, 

'ine cloze procedure has numerous advantages and disadvan- 
tages. Among the advantages pointed out by Taylor (1953) and Klare, 
Sinaiko, and Stolurow (1970) are; (1) scoring reliability is very high, 
(2) it works well with "non-standard" material, (3) it accounts for in- 
terest and prior knowledge of reader populations, (4) subjects of all 
abilities seem to enjoy it, and (5) test materials are easy to con- 
struct. 

Among its disadvantages are: (1) cloze is a measure, not a. 
predictor of readability, requiring testing- of sizable samples of people, 
(2) Klare et aU (1970) hypothesize that it may depend more on know- 
ledge of language than subject matter, (3) it may not accurately re- 
flect all types of comprehension, and (4) it may depend excessively 
on "short-range constraints "--the four or five words appearing on 
each side of the deleted word. 
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Application of Readability Formulas as "Rules ^or Writing" 

The urge to apply mechanically the traditional readability 
formulas for purposes of improving readability is very strong, but 
may not be appropriate. Smith and Kincaid (1970) point out that 
readability scores ai'e gross measures of difficulty at best and must 
not be taken as indices of good or bad writing. A deliberate attempt 
to shorten sentence and word length does not necessarily enhance 
readability. In fact, readability may be degraded. A more advantage- 
ous approach is to make the writing more logical and precise. When 
combmedwith these principles, consideration of the structural fac- 
tors contained in readability formulas may, however, contribute to 
I'ead ability. 

The recommendations of Flesch (1951) are typical of those 
made in hope of improving readability. He suggests that: (l)a per- 
sonal type of discourse be adopted, (2) the importance of points pre- 
sented be discussed, (3) introductions and summary statement be in- 
cluded, (4) punctuation be used in such a manner as to nelp the read- 
er, (4) points be discussed in chronological order or in order of in- 
creasing importance, and (6) excessive wordiness be avoided. He does 
recommend, additionally, that short paragraphs, sentences, and words 
be used. 

The advantage to be gained by careful consideration of the over- 
all structure of a passage of text was demonstrated by Lee (1965). Sig- 
nificantly better learning was found from articles written in a highly 
structured manner compared to "normal" articles or those in which 
paragraph order has been randomized. The highly structured articles 
differed from the normal in that they included: (1) an introductory para- 
graph outlining the points to follow, (2) a final summarizing paragraph, 
(3) a number of major and minor headings, and (4) transitional para- 
graphs elaborating on completed and subsequent topics and emphasizing 
the organization of the paper. 




Multimodal Presentation 

The field of information transfer has also turned to the in- 
vestigation of mult ;modal information presentation to enliance and 
facilitate the information transfer process. Research in this area 
has been mainly devoted to answering the question: Can method X 
transfer information as well as some other method? It has been 
argued (Phillips, 1966) that this is not a worthwhile issue because 
the question assumes that the method against which comparisons are 
being made has been the most effective in transmitting the informa- 
tion. What the research may actually demonstrate is that method X 
may transfer information just as poorly as method Y. Phillips (1966) 
suggests that a more relevant question is: 



Which resource or combination of resoitrces (people, places, media) is ap- 
propriate for teaching what type of subject matter to what type of learner 
under what conditions (time, place, size of group, and so on) to achieve 
what purpose? (p. 374). 



Research on improving instructional materials has often oc- 
curred mainly witliout any reference to any precise theoretical no- 
tions (Briggs, 1966; Lumsdaine, 1964). Theories of learning have 
not been taken into account. Even though a great deal of research 
has been performed to improve the effectiveness with which materi- 
als are presented in certain media, when one asks which media will 
be more effective in presenting a certain type of material to a special 
class of learner, one comes to a standstill. 

An experiment by Bouriss-eau, Davis, and Yamamato (1965) 
demonstrated how assumptions in the audiovisual field often fail to 
possess merit. It is generally assumed that in tei*ms of direct sen- 
sory appeal, pictures are superior to printed or spoken words. "A 
picture is worth a thousand words" is generally assumed. Contrary 
to this belief, Bourisseau, Davis, and Yamamato (1965) demonstrated 
that pictorial stimuli are inferior to verbal (printed) stimuli in regard 
to both the number of subjects making sensory responses and the total 
number of sensory responses evoked. 
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There have been a number of studies which suggest little, or 
no payoff from multimodal presentation, 

Virag (1971) attempted to assess the effectiveness of three 
modes in transmitting content to students with different aptitudes. 
The three instructional modes were: (1) low verbal (tape-slides and 
short film episodes- -materials presented at a fixed pace via the audio- 
visual communicative channels), (2) high verbal (written case studies 
--presented at a self pace through the channel of print), and (3) con- 
ventional (short lectures— presented at a fixed pace through the audio- 
communicative channel). The results indicated that no single mode of 
instruction was consistently more effective than the other two for any 
particular aptitude pattern. 

In an attempt to prepare an audio -tutorial minicourse. Long 
(1970) made use of five methods. The procedures presented the ma- 
terials to be learned through: (1) printed text, (2) printed text with 
supplementary programmed items, (3) printed text with supplementary 
laboratory manipulations and observations, (4) taped audio program 
with supplementary programmed items, and (5) taped tutorial program 
with laboratory manipulations and observations. No significant differ- 
ences were found to exist between the five instructional methods em- 
ployed. 

Travers (1965) reported a study performed by Van Mondfrans 
in which verbal materials to be learned were presented auditorially, 
visually, and simultaneous audio-visual presentations. The results* 
indicated that there was no difference between single sense channel 
presentations and no difference between single and multiple sense 
channel presentations. This research supports the earlier conclu- 
sion by Van Mondfrans and Travers (1964) that theuse of two sensory 
modalities has no advantage over one in the learning of material which 
is redundant across modalities. 

In a study by Goodrich (1971), four tvpes of literary forms or 
subjects were presented via four instructioxxal media: (1) fiction, (2) 
nonfiction (autobiography), (3) a lecture about -literary symbolism, 
and (4) a lecture about composition. The four media were: (1) TV, 
• (2) audio tape, (3) face-to-face lecture, and (4) text. There were ' 
no significant differences between the different media. Goodrich con- 
cluded that the effects of the medium may not be noticeable under 
normal learning situations. 
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Sticht (1969) presented materials of differing levels of diffi- 
culty to Ss of different aptitudes through the visual and auditory sen- 
sory channels. According to Sticht, the results indicated that listen- 
ing was as effective as reading in transmitting information of all three 
difficulty levels for both average and low aptitude subjects, Siegel, 
Barciki and Macpherson (1965), at Applied Psychological Services, 
presented materials to four groups of college students via the auditory 
and visual channels. Two groups, one auditory and one visual, also 
received adjunct programmed materials. The adjunct materials con- 
sisted of multiple choice questions with correct answers. The use of 
the adjunct materials provided feedback to the learner on what infor- 
mation was missed. The results indicated that both audio and visual 
presentation with adjunct programmed materials were significantly 
better than without adjunct materials. There were no significant dif- 
ferences between the different sensory channel presentations. 

On the other hand, a number of studies have suggested some 
gain to accrue from multichannel presentation. 

Singer (1970) investigated comprehension as effected by vary- 
ing visual and auditory presentation. The information to be learned 
was presented: (1) textually, (2) verbally, and (3) verbally paired with 
reading. Comprehension was poorer when the materials were pre- 
sented only auditorially, but no difference existed between reading and 
reading with listening. 

Severin (1967) presented materials via the auditory, the visual, 
and the audio -visual modes of presentation. The results indicated that 
print and print with audio were superior to audio for transferring in- 
formation. There was no difference between print and print with audio. 
These findings were substantiated in a later study (Singer, 1970). 

The results of a recent study (Senour, 1971) concerning the ef- 
fects of student control of audio tape learning indicated that providing 
control functions to the student; i. e. , the capability for starting, stop- 
ping, and replaying the tape, aided learner achievement as compared 
to the situation which denied them. There was a significantly positive 
correlation between learner achievement and the number of times the 
learner elected to use the controls. The subjects reported that they 
did not have to take notes because they could replay the tape until they 
knew it. 
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Nelson (1970) reported a study on the effects of visual-audi- 
tory modality preference on learning mode preference. The results 
indicated that auditory and visual perceptual subtests of reading 
readiness tests are not sufficientiy sensitive to discern modality 
preferences. Another report (Hueber, 1970) supported the results 
obtained by Nelson (1970). 

If sensory modality preferences do affect information transfer 
a means of determining these preferences must be developed. If audio 
tape presentations of information are going to be used, can subjects 
be taught to listen more attentively? A large body of research indi- 
cated that training to listen is possible (Brown, 1954; Erikson, 1954; 
rwin 1953; Nichols, 1949; Lewis, 1956). Other researchers (Erikson 
1954; Irvin, 1954) suggest that low listening ability subjects benefit from 
such training more than subjects of high listening abUity. 



CHAPTER III 




DISCUSSION 



The most important characteristics of the readability formu- 
las described here, as well as of those formulas appearing before 
1953, and considered of at least moderate importance are presented 
in tabular fc -m in Appendix A to this review. 

In terms of application of these formulas for predicting read- 
ability, in the late fifties, there appeared to be rather general agree- 
ment (Chall, 1958; Klare, 1963; Powers et al. , 1958), that the most 
precise formula available, and therefore the one to be most generally 
recommended, was the Dale -Chall method. If reference to a word 
list was to be avoided, the Flesch formula was recommended, unless 
other factors such as special reader populations, types of reading ma- 
terial, or particular advantages of one or another formula warranted 
another choice. It does not seem that this general set of recommenda- 
tions should be changed at this time. 



Estimation of Reading Level 

The majority of the readability formulas predict readability 
in terms of reading grade level. In order to apply them most effec- 
tively, knowledge must be obtained concerning the distribution of read- 
ing grade levels of the expected reader population. The variability of 
reading grade level within school grades and within groups of adults 
having achieved particular levels of education found by Gray and Leary 
(1935) suggests that pure estimation of reading grade level is some- 
what unreliable. Reading grade level may be determined by admin- 
istering standard tests of reading ability, of which 37 are reported as 
being available in The Sixth Mental Measurements Yearb-^ok (Euros, 
19.65), to a sample of the expected reading population. 
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However, in certain areas of application, most notably the 
military, a much more efficient, less time consuming, and less 
expensive procedure may be employed. It has been found in the 
Air Force (Madden & Tupes, 1966) and in the Army (Caylor et al. , 
1972) that reading grade level may be estimated from certain apti- 
tude test scores. Madden and Tupes noted that the general aptitude 
index (AI) of the Airman Qualifying Exam (AQE) correlated above . 70 
with reading level. This is largely due to the inclusion of a reading 
vocabulary subtest score within the general AI. Although reading 
grade level as measured by the California Test of Reading Vocabulary 
and Reading Comprehension was estimable from the general AI alone, 
more accurate prediction of an individual's reading grade level could 
be made by using a regression equation based on general AI and the 
individual's selector AI score. The latter is one of three aptitude 
area scores— administrative, mechanical, or electronic— which are 
referred to when assigning men to care.er fields in the Air Force. For 
some career fields, the selector variable is the administrative AI; 
for others, it is the mechanical AI, while for others, the electronic AI 
is employed. Selection to a few career fields is based solely on gen- 
eral AI. The regression equations appropriate for estimating reading 
grade level of individuals in career fields which use the selector AIs 
for personnel selection are; 

administrative: RGL « .0437(GenAI) + .0501(AdAI) + 5.0730 
mechanical: RGL - . 0991(GenAI) + . 0085(MechAI) + 5. 0459 
electronic: RGL = . 0743(GenAI) + . 0222(E1AI) + 4. 6088 



Caylor et al. (1972) similarly produced a regression equation 
to predict reading grade level as measured by the U. S. Armed Forces 
Institute (USAFI) Reading Achievement Test III, Form A, from know- 
ledge of an individual's Armed Forces Qualifying Test (AFQT) score. 
Their equation is: 

RGL - . 75(AFQT score) + 5. 52. 
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These regression formulas allow much more confidence to 
be placed in statements concerning the appropriate readability lev- 
els of materials intended for use by Army or Air Force enlisted 
personnel. 



Discussion of Formulas 

A number of general criticisms have been applied to readabil- 
ity formulas. The definition of the criterion of comprehensibility has 
been a problem since the earliest studies (Bormuth, 1966). The usu- 
al practice has been to administer multiple choice criterion questions 
just after the passages being tested are read. Lorge (1939) criticized 
this procedure because test performance may be strongly influenced 
by the difficulty of the language cf the test questions. 

The difficulty of the questions may also be varied, so that the 
subject maybe able to score highly by simply remembering details of 
• a passage, or he may be required to determine the author's purpose 
in writing the passage, and select the best of several highly similar al- 
ternatives. 

In addition. Fry (1968) pointed out that reading grade levels 
are not rigorously defined, so that different reading tests, especially 
those developed at different times, may provide different grade levels 
for identical subjects. New reading tests are more difficult than old 
ones, indicating that students at given grade levels read better than 
their predecessors. He summarizes the problem as one of "trying 
to determine grade level when grade level won't stand still. " It 
seems strange that the various predictions have not been corrected 
for criterion unreliability. Also, there is no agreement on the level 
of comprehension to be accepted. Fifty and 75 per cent were common 
when McCall-Crabbs tests were employed, but the FORCAST formula 
uses 70 per cent (35 per cent cloze score) and the SMOG count was 
based on 100 per cent comprehension, 

Most of the readabili / formulas do not account for the effects 
on readability of the reader's interests, experiences, 4>r aptitudes. 
Exceptions to this are the supplementary indices of human interest, 
abstraction, realism, energy, and formality-popularity of Flesch, 



along with the FORCAST formula, and Jacobson's measures of the 
readability of physics and chemistry texts. However, some of these 
measures will allow accurate reflection of the readability of materi- 
al appearing in professional or technical joumala 

Finally, as Bormuth (1966) pointed out, until very recent years 
no theoretical base was available from which to generate testable hy- 
potheses relating to readability. Powerful theories of language behavi- 
or did not exist, so that only the most obvious statistical characteris- 
tics of the written text were studied. 

Although he did not consider it appropriate to present his for- 
mulas in his 1966 paper, Bormuth reported that he has written regres- 
sion equations whose predictions correlate up to . 93 with cloze test 
score. He considers cloze scores to be the only acceptable criterion 
of readability currently available. He reported that all c. the variables 
in his formulas are new ones generated from modern linguistic theories 
such as those of Chomsky and Yngve. None of the traditional readabil- 
ity variables were powerful enough to be included in his formulas. 

Inter-user and test-retest reliabilities of formulas vary widely. 
Kincaid (1972) found that the test-retest reliability of the McElroy fog 
count was only . 5. He also pointed out that he gets a headache after 
taking a fog count for 30 minutes. The highly objective ARI, however, 
has a measured test-retest reliability of over . 99 (Huff, 1970). England, 
Thomas, and Patterson (1948) found the interrater and test-retest reli- 
abilities of the Flesch reading ease formula were approximately . 90. 

The reliabilities of other readability formulas had not been sat- 
isfactorily tested prior to 1958, according to Klare (1963), and work 
of this type has not appeared since that time. 
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All of the readability measuring procedures discussed show 
roughly similar correlations with their respecti^ e criteria of dif- 
ficulty. Additionally, all of the criteria seem to be highly inter cor- 
related. Hence, from the validity standpoint, there is little basis 
for selecting one readability measure over another. In this case, 
utility and practicality for the individual user seem to represent the 
criteria to be employed when selecting a readability measure. As 
has been previously reported, the Dale-Chall and Flesch formulas, 
as revised by Powers, Sumner, and Kearl (1958), are the most highly 
respected of the traditional formulas, but considerations of particular 
characteristics of an individual situation may warrant choice of one 
of the many other available formulas. 

There i^, however, some indication that the mo.5t powerful 
method of measuring readability currently available is the cloze meth- 
od. However, cloze is less easy to apply than the other techniques. 
Application of the cloze method requires preparation of numerous test 
forms, assembly of a group of subjects who are representative of the 
- appropriate reader population, and considerable administration and 
scoring time. Also, cloze tests cannot be used to monitor the'read- 
ability of material as it is being written. 

It has been known since at least the time of tht; Dale-Tyler 
study in 1934, that using readability formulas as a '^asis for "rules 
for writing" is not an effective way to produce readable material. 
The recommendations made to writers interested in improving the 
information transfer capability of their output have always been of 
a "qualitative" nature, as opposed to the "quantitative" nature of the 
readability formulas. Recommendations to writers concern such con- 
siderations as stylistic variables, level of abstractness, coherency, 
obscurity of expression, difficulty of ideas expressed, ideational 
density, and soon. These factors are generally ignored by readabil- 
ity formulas, since they are extremely difficult to measure or even 
define. This leaves us in the paradoxical situation of attempting to 
measure readability by measuring factors which do not determine 
reading ease. Readability formulas are based on relatively unim- 
portant characteristics of text which must be considered almost en- 
tirely a rtif actual. 
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Future Research Avenues 



The search for more relevant variables and methods of 
measuring them impresses us as the most pressing need for future 
research in the area of readability measurement. It is likely that 
modern psycholinguistlc theory could be a very stimulating area 
from which to draw Important readability variables. Applications 
of information theory to the study of readability have not proven fruit- 
ful thus far. Such applications have been limited to attempts to meas- 
ure the redundancy of textual material as reflected by the variability 
of responses made in cloze tests. 

In order to measure information transfer using the concepts 
of entropy and bits, it is necessary to be able to specify accurately 
the stimulus alphabet. In this case, the stimulus variables are letters 
or words and the probabilities and conditional probabilities of occur- 
rence of each symbol, from the point of view of the receiver, or read- 
er. Inability to specify adequately these probabilities makes the in- 
formation theoretic approach seem relatively inappropriate at this 
time. 

Research is also needed which focuses on development of a 
manageable criterion of readability. Cloze scores do not represent 
a panacea, but the other criteria available have disadvantages. Com- 
prehension test scores are easily confounded with variable aspects of 
the test questions, and reading grade level is another step removed 
from comprehension, the topic of interest. 

Second, the effects of the additional variables of general intel- 
ligence, interest, and prior experience on readability are not well 
understood. In 1935, Gray and Leary observed that quantitative fac- 
tors related to readability were not the same for good and poor readers. 
But work on this variable has not continued. Few, if any, investigations 
have sought to determine the effects of reader interest and experience 
on readability, although a few formulas have attempted to account for 
these factors, notably the Flesch reading ease formula and the FOR- 
CAST formula. 
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Third, the majority of readability formulas have been vali- 
dated on school students of various levels. Students comprise the 
population of interest in only a small portion of the possible appli- 
cations of readability formulas. Hence, more cross -vaUdational 
studies are needed which are based on samples of adult readers of 
all reading ability levels. ♦ 

Fourth, no measure of readability is available for evaluation 
of tests or programmed instructional material. These types of mate 
rial are being used more and more in our society, and a method of 
measuring their readability is greatly needed. 

In all future research, it will be appropriate to evaluate each 
readability formula developed in mathematical forms other than the 
additive linear model traditionally employed. Many authors have 
found nonlinearities in the relationships between their chosen vari- 
ables and criteria. They have dealt with this problem by presenting 
their data graphically, or presenting a table of corrected values for 
the predictions from their formulas. It is likely that nonlinear re- 
gression techniques account for available data in a significantly more 
adequate manner than do formulas of the current, linear variety. 
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